home *** CD-ROM | disk | FTP | other *** search
- HACK.DOC
-
- The internals of Citadel -- not
- for the weak-hearted
-
- OVERVIEW
-
- The fundamental structure of the system is very simple. CTDLMSG.SYS
- is a circular file. New messages are simply written around the
- buffer in an endless circle, overwriting old messages in their way.
- There is no other way of deleting messages, and text is never
- shuffled on disk. Messages are numbered consecutively and start with
- an FF (hex) byte. Except for this FF start-of-message byte, all
- bytes in the message file have the high bit set to 0. This means
- that in principle it is trivial to scan through the message file and
- locate message N if it exists, or return error. (Complexities, as
- usual, crop up when we try for efficiency...)
-
- Each room is basically just a list of message numbers. Each
- time we enter a new message in a room, we slide all the old message-
- numbers down a slot, and probably the oldest one falls off the bottom.
- Reading a room is just a matter looking up the messages one by one
- and printing them out. If the message has been overwritten already,
- we just ignore it.
-
- Implementing the "new message" function is also trivial in
- principle: we just keep track, for each caller in the userlog, of
- the highest-numbered message which existed on the >last< call.
- (Remember, message numbers are simply assigned sequentially each
- time a message is created. This sequence is global to the entire
- system, not local within a room.) If we ignore all message-numbers
- in the room less than this, only new messages will be printed. Voila!
- Code up the system described thus far, and you'll have a good
- approximation to Version 1. Better stop and reread everything to
- here, so you can pick out the fundamental mechanisms among all of
- Version 2's bells and whistles.
-
-
- ----------------------------------------------------------------------
-
- message format on disk (CTDLMSG.SYS)
-
-
- Message format has changed relative to V1, in the direction of
- using more disk space and providing greater flexibility.
-
- A message now consists of a sequence of character strings. Each
- string begins with a type byte indicating the meaning of the string
- and is ended with a null. All strings are printable ASCII: in
- particular, all numbers are in ASCII rather than binary. This is
- for simplicity, both in implementing the system and in implementing
- other code to work with the system. For instance, a database driven
- off Citadel archives can do wildcard matching without worrying about
- unpacking binary data such as dates first. To provide later
- downward compatability, all software should be written to IGNORE
- fields not currently defined.
-
-
- The type bytes currently defined are:
-
- BYTE Mnemonic Comments
-
- 0xFF Start-of-message indicator. Followed by local
- message ID# as ASCII string, null-terminated as
- always. This byte is the >only< byte which has
- the high bit set in a Citadel message.buf file.
- This field must be present in every message.
- A Author Name of originator of message.
- D Date Date message was created.
- C Time Time message was created.
- M Message Text of message. Is last field in a message, by
- definition. Following data will be ignored.
- This field must be present in every message.
- N Name Human name for node originated on. Used on
- title line of foreign messages. Ex:
- ODD-DATA
- will produce a title message something like
- 82Nov23 from Cynbe ru Taren @ODD-DATA
- O Origin ID of node message originated on: Country code plus
- phone number of system. (Not stored for locally
- originated messages.) Ex:
- US 206 633 3282
- R Room Room of origin. Topic.
- S Source ID# Message ID # on system message was created on.
- Two 16-bit integers (high and low halves of
- full 32-bit ID#) separated by a blank. Ex:
- 0 13654
- T To Addressee. Used only for private messages in Mail>,
- in version 2.00 .
- n Reserved for future use.
-
- EXAMPLE
-
- Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a
- message which prints as
-
- LOGLAN> read new
-
- 82Nov04 From James Cooke Brown
- Loi uiue i Ti logla
-
- LOGLAN>
-
- might be stored as
-
- <FF>0 3583<0>D82Nov04<0>AJames Cooke Brown<0>RLOGLAN<0>MLoi uiue i Ti logla<0>
- |--Local ID--|---Date---|-----Author---------|--Room---|-------Message--------|
-
- The date, room and author fields could be in any order. Not all
- fields are printed by default. The local ID# and Room field are
- suppressed here. An isolated system will not normally have use for
- fields beyond those used in this example.
-
- Lines are marked with C NewLine (ASCII LF) characters, within
- the message text proper.
-
-
-
- ----------------------------------------------------------------------
-
- NETWORKING
-
-
- Citadel nodes network by sharing one or more rooms. Any Citadel
- node can choose to keep an image of any public room on any other
- Citadel node (subject to willingness to foot the phone bills, of
- course!). The procedure in essence simply involves calling the
- imaged node up periodically and copying any new messages in the
- imaged room into the local image.
-
- There is no necessary reciprocity or pre-arrangement, although
- convenience and politeness respectively suggest both. The node
- which gets the information foots the phone bill for the transaction.
- This promotes simple and harmonious relations between the nodes.
-
- Complexities arise primarily from the possibility of densely
- connected networks: one does not wish to accumulate multiple copies
- of a given message, which can easily happen. Nor does one want to
- see old messages percolating indefinitely through the system.
-
- This problem is handled by a simple brute-force mechanism: each
- node keeps a list of all messages it has seen recently, recording
- origin system, creation date, and original ID#. When downloading,
- messages which have already been seen, or which are too old to be
- remembered, are skipped. Messages can percolate outward through a
- large network with no global routing or control, but do not
- reproduce wildly or cycle indefinitely.
-
-
- The above discussion should make the function of the new fields
- reasonably clear:
-
- o Every message needs a local ID#, so the system can determine if
- a given caller has seen it before.
- o Travelling messages need to carry system of origin, date of
- origin, and original ID# with them, to keep reproduction and
- cycling under control.
-
-
-
- ----------------------------------------------------------------------
-
- PORTABILITY PROBLEMS
-
-
- Check all I/O, modem, console, and file stuff, and especially
- the dPrintf and mPrintf functions.
-
-
-
- ----------------------------------------------------------------------
-
- "Room" records (CTDLROOM.SYS)
-
-
- The rooms are basically indices into ctdlMsg.sys, the message
- file. As noted in the overview, each is essentially an array of
- pointers into the message file. The pointers consist of a 16-bit
- message ID number (we will wrap around at 64K for these purposes)
- together with a 16-bit psuedo-sector offset within ctdlMsg.sys
- telling us where the message begins.
-
- Since messages are number sequentially and written circularly,
- the set of messages existing in ctdlMsg.sys will always form a
- continuous sequence at any given time. Thus, by remembering the ID
- numbers of the oldest and newest messages in the message file, we
- can check to see if a message exists >before< going to disk, saving
- ourselves (and the disk drive) the pain of futile seeks in search of
- the nonexistent. This information is recorded in oldest and newest,
- 32 bit numbers. You'll be seeing more of these...
-
- The newest is simply incremented each time we enter a new
- message in the message files. Oldest is incremented each time we
- overwrite an FF (start-of-message) byte in the course of writing a
- new message into the files. This corresponds to dead reckoning --
- current code never checks to see that the message number of the
- message we are overwriting is what we think it is. In a garbaged
- file with extra FF bytes around, this could cause oldest to count
- too rapidly, eventually perhaps overtaking newest, at which time the
- system will look completely empty. If you suspect something like
- this is going on, just use configur.exe to rebuild accurate numbers.
-
- That should be enough background to tackle a full-scale room. From
- ctdl.h :
-
- struct {
- unsigned char rbgen; /* generation number of room */
- struct rflags rbflags; /* same bits as flags above */
- char rbname[NAMESIZE]; /* name of this room */
- pathSpec rbdirname; /* user directory for room's files */
- struct {
- ulong rbmsgNo; /* every message gets unique# */
- ulong rbmsgLoc; /* sector message starts in */
- } msg[MSGSPERRM];
- } roomBuf;
-
- [Note that all components start with "rb" for roomBuf, to make sure
- we don't accidentally use an offset in the wrong structure. Be very
- careful also to get a meaningful sequence of components -- C86
- provides no checking on this sort of stuff either.]
-
- Rbgen handles the problem of rooms which have died and been
- reborn under another name. This will be clearer when we get to the
- log file. For now, just note that each room has a generation number
- which is bumped by one each time it is recycled.
-
- Rbflags is just a bag of flags recording the status of the room.
- The defined flags are:
-
- INUSE. TRUE if the room is valid, FALSE if it is free for
- re-assignment.
- PUBLIC. TRUE if the room is visible by default.
- ISDIR. TRUE if the room is a window onto some disk/userspace.
- PERMROOM. TRUE if the room should not be recycled even if empty.
- (Lobby>, Mail> and all ISDIRs are automatically permanent.)
-
-
- Rbname is just an ASCII string (null-terminated, like all strings)
- giving the name of the room.
-
- Rbdirname is meaningful only in ISDIR rooms, in which case it
- gives the full pathname of the directory to window (in the form
- drive:path.)
-
- Finally, msg is the array of pointers into the message file.
- RbmsgNo is the ID number of the message, and RbmsgLoc is the sector
- it begins in. (For NIL, we stick the value -1 in RbmsgLoc.)
-
-
- RoomTab is just a little index into ctdlRoom.sys, to keep us
- from constantly pounding around on the disk looking for things. It
- basically records the name and status of each room. It is 100%
- redundant with the file itself... as all our indices are. (As all
- indices should be!) Note that RoomTab is a significant consumer of
- RAM all by itself. It is RAM well spent, but if you have to shave
- Citadel a few K to make it fit your system, cutting the number of
- rooms a bit is one try.
-
- The only field new to us in roomTab is rtlastMessage, recording the
- most recent message in the room. When we are searching for rooms
- with messages a given caller hasn't seen, we can check this number
- in RAM and avoid unprofitable disk accesses.
-
-
- ----------------------------------------------------------------------
-
- log records (CTDLLOG.SYS)
-
-
- This is the fun one. Get some fresh air and plug in your
- thinking cap first. (Time, space and complexity are the eternal
- software rivals. We've got 256 log entries x about 500 messages
- spread over up to 128 rooms to worry about, and with floppies disk
- access time is important... so perforce, we opt for lots of
- complexity to keep time and space in bounds.)
-
- To understand what is happening in the log code takes a little
- persistence. You also have to disentangle the different activities
- going on and tackle them one by one.
-
- o We want to remember some random things such as terminal screen
- size, and automatically set them up for each caller at login.
-
- o We want to be able to locate all new messages, and only new
- messages, efficiently. Messages should stay new even if it
- takes a caller a couple of calls to get around to them.
-
- o We want to remember which private rooms a given caller knows
- about, and treat them as normal rooms. This means mostly
- automatically seeking out those with new messages. (Obviously,
- we >don't< want to do this for unknown private rooms!) This
- has to be secure against the periodic recycling of rooms
- between calls.
-
- o We want to support private mail to a caller.
-
- o We want to provide some protection of this information (via
- passwords at login) and some assurance that messages are from
- who they purport to be from (within the system -- one shouldn't
- be able to forge messages from established users).
-
- Lifting another page from ctdl.h gives us:
-
- struct logBuffer {
- unsigned char lbnulls; /* # nulls to print after newline */
- struct lflags lbflags; /* UCMASK, LFMASK, EXPERT */
- unsigned char lbwidth; /* terminal width */
- char lbname[NAMESIZE]; /* caller's name */
- char lbpw[NAMESIZE]; /* caller's password */
- int lbgen[MAXROOMS]; /* 5 bits gen, 3 bits lastVisit */
- ulong lbvisit[MAXVISIT]; /* newestLo on last few calls */
- ulong lbslot[MAILSLOTS]; /* for private mail */
- ulong lbId[MAILSLOTS]; /* for private mail */
- }
-
- Looks simple enough, doesn't it? One topic at a time:
-
- RANDOM CONFIGURATION PARAMETERS
-
- These are in the first three fields in the record. Lbnulls
- gives the number of nulls to print after a newline, for folks who
- like simultaneous hardcopy. Or any remaining ASR33 aficionados
- calling up... Lbwidth is the caller's screen width. We format all
- messages to this width, as best we can. Lbflags is another bit-bag,
- recording uppercase-only folks, people who need a linefeed after a
- carraige-return, people who want to suppress the little automatic
- hints all through the system, and people who like to see the time a
- message was created.
-
-
- FINDING NEW MESSAGES
-
- This is the most important. Thus, it winds up being the most
- elaborate. Conceptually, what we would like to do is mark each
- message with a bit after our caller has read it, so we can avoid
- printing it out again next call. Unfortunately, with 256 log
- entries this would require adding two sectors to each message... and
- we'd wind up reading off disk lots of messages which would never get
- printed. So we resort to an arcane mixture of approximation and low
- animal cunning.
-
- The approximation comes in doing things at the granularity of
- rooms rather than messages. Messages in a given room are "new"
- until we visit it, and "old" after we leave the room... whether we
- read any of them or not. This can actually be defended: anyone who
- passes through a room without reading the contents probably just isn't
- interested in the topic, and would just as soon not be dragged back
- every visit and forced to read them. Given that messages are
- numbered sequentially, we can simply record the most recent message
- ID# of each room as of the last time we visited it. With 128 rooms,
- this would give us (for each user) an array of 128 integers, or 256
- bytes.
-
- This is still too much -- I'd like the complete log record for a
- user to be 256 bytes or less, and we have other stuff to do yet.
-
- So, we complicate a little more. We record in lbvisit[MAXVISIT]
- the most recent message ID# in the system on each of the last six
- calls or so. Now, for each room, we can just indicate which call
- we last visited the room on. This takes 3 bits per room, which we
- stash in the low three bits of lbgen[MAXROOMS]. Now we're down to
- 128 rooms x 3 bits (plus a few bytes in lbvisit[], of course), which
- is quite reasonable.
-
- Putting it all together, we can now compute whether a given room
- has new messages for our current caller without going to disk at all:
-
- > We get the lbgen[] entry for the room in question
- > We mask off the lower 3 bits
- > We use the result as an index into lbvisit[], getting the ID number
- of the most recent message in the system as of the last time we
- visited the room.
- > We compare this with roomTab[].rtlastMessage, which tells us
- what the most recent message in the room is currently.
-
-
- REMEMBERING WHICH PRIVATE ROOMS TO VISIT
-
- This looks trivial at first glance -- just record one bit per
- room per caller in the log records. The problem is that rooms get
- recycled periodically, and we'd rather not run through 256 log
- entries each time we do it. So we adopt a kludge which should
- work 99% of the time.
-
- As previously noted, each room has a generation number, which is
- bumped by one each time it is recycled. As not noted, this
- generation number runs from 0 -> 31 (and then wraps around and
- starts over). Thus, these numbers take 5 bits to represent. By a
- miraculous coincidence, we have exactly 5 bits left in the lbgen[]
- entries in the log records. [Anyone familiar with "capability"
- pointers will be encountering deja vu here...]
-
- When someone visits a room, we set the generation number in lbgen[]
- equal to that of the room. This flags the room as being
- available. If the room gets recycled, on our next visit the two
- generation numbers will no longer match, and the room will no longer
- be available -- just the result we're looking for. (Naturally, if a
- room is marked as PUBLIC, all this stuff is irrelevant.)
-
- This leaves only the problem of an accidental matchup between
- the two numbers giving someone access to a Forbidden Room. We can't
- eliminate this danger completely, but it can be reduced to
- insignificance for most purposes. (Just don't bet megabucks on the
- security of this system!) Each time someone logs in, we set all
- "wrong" generation numbers to be one less than the actual generation
- of the room. This means that the room must be recycled thirty-one
- times before an accidental matchup can be achieved. (We do this for
- all rooms, INUSE or dead, public or private, since any of them may
- be reincarnated as a Forbidden Room.)
-
- Thus, for someone to accidentally be lead to a Forbidden Room,
- they must establish an account on the system, then not call until
- some room has been recycle thirty-one to thirty-two times, which
- room must be reincarnated as a Forbidden Room, which someone must
- now call back (having not scrolled off the userlog in the mean time)
- and read new messages. The last clause is about the only probable
- one in the sequence. The danger of this is much less than the danger
- that someone will simply guess the name of the room outright...
-
-
- SUPPORTING PRIVATE MAIL
-
- Can one have an elegant kludge? This must come pretty close.
-
- Private mail is sent and recieved in the Mail> room, which
- otherwise behaves pretty much as any other room. To make this work,
- we store the actual message pointers in lbslot[] and lbId[] in the
- caller's log record, and then copy them into the Mail> room array
- whenever we enter the room. This requires a little fiddling to get
- things just right. We have to update roomTab[MAILROOM].rtlastMessage
- at login to reflect the presence or absence of new messages, for
- example. And MakeMessage() has to be kludged to ask for the name of
- the recipient of the message whenever a message is entered in Mail>.
- But basically it works pretty well, keeping the code and user
- interface simple and regular.
-
-
- PASSWORDS AND NAME VALIDATION
-
- LogTab[] indexes CTDLLOG.SYS, giving us a quick way of finding
- people. It is basically a chronologically sorted hash table. We
- keep a two-byte hash of the name and password of each caller in RAM.
- When someone tries to log in, we just whip through the table in
- order looking for matches on the password hash and loading the
- matching logfile entry in. Bogus hits are eliminated by the simple
- expedient of refusing to acknowledge a new user who's name or
- password hashes on top of an existing user. Computer chauvinism at
- it's best...
-
- This makes it difficult to forge messages from an existing user.
- (Fine point: nonprinting characters are converted to printing
- characters, and leading, trailing, and double blanks are deleted.)
-